Skip to content

Normalize precomposed Unicode characters.#189

Merged
scossu merged 3 commits intomainfrom
decompose
Mar 26, 2025
Merged

Normalize precomposed Unicode characters.#189
scossu merged 3 commits intomainfrom
decompose

Conversation

@scossu
Copy link
Copy Markdown
Collaborator

@scossu scossu commented Mar 16, 2025

This PR adds a normalization step to both S2R and R2S. It converts all pre-composed characters in the source into their decomposed form (combining diacritic + base symbol). With this step, conversion tables only need to address tokens in the decomposed form.

If this is the intended behavior for Scriptshifter, please approve.

@RandyBarry
Copy link
Copy Markdown
Collaborator

RandyBarry commented Mar 16, 2025 via email

@scossu
Copy link
Copy Markdown
Collaborator Author

scossu commented Mar 16, 2025

Would it be safe to apply to R2S only?

@RandyBarry
Copy link
Copy Markdown
Collaborator

RandyBarry commented Mar 17, 2025 via email

@scossu scossu merged commit 5eea9a9 into main Mar 26, 2025
@scossu scossu deleted the decompose branch July 12, 2025 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants